Unsupervised language model adaptation

نویسندگان

  • Michiel Bacchiani
  • Brian Roark
چکیده

This paper investigates unsupervised language model adaptation, from ASR transcripts. N-gram counts from these transcripts can be used either to adapt an existing n-gram model or to build an n-gram model from scratch. Various experimental results are reported on a particular domain adaptation task, namely building a customer care application starting from a general voicemail transcription system. The experiments investigate the effectiveness of various adaptation strategies, including iterative adaptation and self-adaptation on the test data. They show an error rate reduction of 3.9% over the unadapted baseline performance, from 28% to 24.1%, using 17 hours of unsupervised adaptation material. This is 51% of the 7.7% adaptation gain obtained by supervised adaptation. Self-adaptation on the test data resulted in a 1.3% improvement over the baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating MAP, marginals, and unsupervised language model adaptation

We investigate the integration of various language model adaptation approaches for a cross-genre adaptation task to improve Mandarin ASR system performance on a recently introduced new genre, broadcast conversation (BC). Various language model adaptation strategies are investigated and their efficacies are evaluated based on ASR performance, including unsupervised language model adaptation from...

متن کامل

Unsupervised language model adaptation for broadcast news

Unsupervised language model adaptation for speech recognition is challenging, particularly for complicated tasks such the transcription of broadcast news (BN) data. This paper presents an unsupervised adaptation method for language modeling based on information retrieval techniques. The method is designed for the broadcast news transcription task where the topics of the audio data cannot be pre...

متن کامل

Unsupervised language model adaptation for lecture speech transcription

Unsupervised adaptation methods have been applied successfully to the acoustic models of speech recognition systems for some time. Relatively little work has been carried out in the area of unsupervised language model adaptation however. The work presented here uses the output of a speech recogniser to adapt the backoff n-gram language model used in the decoding process. We report results for t...

متن کامل

Unsupervised Language Model Adapt Transcriptio

Unsupervised adaptation methods have been applied successfully to the acoustic models of speech recognition systems for some time. Relatively little work has been carried out in the area of unsupervised language model adaptation however. The work presented here uses the output of a speech recogniser to adapt the backoff n-gram language model used in the decoding process. We report results for t...

متن کامل

Unsupervised language model adaptation using latent semantic marginals

We integrated the Latent Dirichlet Allocation (LDA) approach, a latent semantic analysis model, into unsupervised language model adaptation framework. We adapted a background language model by minimizing the Kullback-Leibler divergence between the adapted model and the background model subject to a constraint that the marginalized unigram probability distribution of the adapted model is equal t...

متن کامل

Online adaptation of language models in spoken dialogue systems

The robust estimation of language models for new applications of spoken dialogue systems often suffers from a shortcoming of training material. An alternative to training a language model is to improve an initial language model using material obtained while running the new system, thus adapting it to the new task. In this paper we investigate different methods for onlineadaptation of language m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003